Describing Images using Inferred Visual Dependency Representations
نویسندگان
چکیده
The Visual Dependency Representation (VDR) is an explicit model of the spatial relationships between objects in an image. In this paper we present an approach to training a VDR Parsing Model without the extensive human supervision used in previous work. Our approach is to find the objects mentioned in a given description using a state-of-the-art object detector, and to use successful detections to produce training data. The description of an unseen image is produced by first predicting its VDR over automatically detected objects, and then generating the text with a template-based generation model using the predicted VDR. The performance of our approach is comparable to a state-ofthe-art multimodal deep neural network in images depicting actions.
منابع مشابه
Image Description using Visual Dependency Representations
Describing the main event of an image involves identifying the objects depicted and predicting the relationships between them. Previous approaches have represented images as unstructured bags of regions, which makes it difficult to accurately predict meaningful relationships between regions. In this paper, we introduce visual dependency representations to capture the relationships between the o...
متن کاملQuery-by-Example Image Retrieval using Visual Dependency Representations
Image retrieval models typically represent images as bags-of-terms, a representation that is wellsuited to matching images based on the presence or absence of terms. For some information needs, such as searching for images of people performing actions, it may be useful to retain data about how parts of an image relate to each other. If the underlying representation of an image can distinguish b...
متن کاملA Treebank of Visual and Linguistic Data
The treebank is a new resource for researchers working at the intersection between vision and language. It will be a freely-available corpus of images and corresponding text for the development and evaluation of models for natural language generation, image annotation, and structure induction. The treebank differs from existing datasets because it contains syntactic representations of the data,...
متن کاملFace Shape Recovery from a Single Image View
The problem of acquiring surface models of faces is an important one with potentially significant applications in biometrics, computer games and production graphics. For such task, the use of shape-from-shading (SFS) is appealing since it is a non-invasive method that mimics the capabilities of the human visual system. In this thesis, our interest lies on the recovery of facial shape from singl...
متن کاملThe Rhetorical - Aesthetic Approach to Constructing the Relation between Images and Visual Inventions with Global Politics
Images and photos play an important role in our understanding of domestic and international events. Today we are living in the age of the visualization of politics. The images are vague, rhetorical, and aesthetic components of political and social phenomena and can give them a beautiful or detestable structure. In the digital age, images in and of themselves can define our structure and vision ...
متن کامل